Motivations

As new and returning residents of New York, we are intrigued by the bustling rat population around us and across each borough. We are interested in exploring the impact of factors like day of the week, month, borough, latitude, longitude, and type of building has on rat sighting around New York. By understanding what influences the amount and location of rat sightings, we will be able to know which areas of the city to avoid and which areas may have the world’s best ratatouille.

Background: History of Rats in NYC

Brown rats are not indigenous to New York; they are a product of the colonization of North America and were brought over on ships in the 18th century. The brown rat is the most common variety in New York today and slowly overtook the black rat population due to their highly aggressive and dominant nature. Brown rats’ ability to tunnel through and eat just about anything has lead them to be one of the top pest in New York for millennia. A testament to their cultural relevance, the Rolling Stones 1978 record, Shattered, made a reference to the rats of New York City: “We’ve got rats on the west side”. More recently, these omnivorous creatures have even spurred the creation of a city government position titled Director of Rodent Mitigation or, colloquially, “Rat Czar”.

Initial Questions

We set out to answer the following questions:

  • How do rat sightings vary over time (month, day of the week, year)
  • How do rat sightings vary by borough and where are they concentrated?
  • What are the important factors in predicting a rat sighting location?

Throughout the course of working on the project, we became interested in sightings year-over-year and included some time series plots in the analysis.

Data

Our is publicly available from Open Data NYC (https://data.cityofnewyork.us/Social-Services/Rat-Sightings/3q43-55fe), downloaded in November, 2023. The raw data contains 232,090 records of rat sightings and variables relating to geographical location, type of location, and time of sighting. In order to begin the data cleaning and analysis process, we loaded the following libraries:

  • tidyverse
  • lubridate
  • readr
  • xts
  • RColorBrewer
  • ggthemes
  • gridExtra
  • leaflet
  • highcharter
  • scales
library(tidyverse)
library(lubridate)
library(readr) 
library(xts)
library("RColorBrewer")
library("ggthemes")
library("gridExtra")
library("leaflet")
library(leaflet.extras)
library("highcharter")
library(scales)

Importing and Cleaning

We begin by importing the rat sightings data using the read_csv function, clean up the variable names with the clean_names function, and create some more useful date variables in a mutate pipeline.

rats_raw <- read_csv("./Rat_Sightings.csv", na = c("", "NA", "N/A", "Unspecified")) %>%
  janitor::clean_names() %>% 
  mutate(created_date = mdy_hms(created_date)) %>%
  mutate(sighting_year = year(created_date),
         sighting_month_num = month(created_date),
         sighting_month = month(created_date, label = TRUE, abbr = FALSE),
         sighting_day = day(created_date),
         sighting_weekday = wday(created_date, label = TRUE, abbr = FALSE)) 

There are 232,090 records of rat sightings, ranging from 2010 to 2023 and across all 5 boroughs.

Important variables to our analysis include:

  • created_date: Date of rat sighting record
  • sighting_year: Year of sighting
  • sighting_month: Month of sighting
  • sighting_day: Sighting day of the month
  • sighting_weekday: Sighting day of the week
  • location_type: Rat sighting location type (Government Building, 3+ Family Apt. Building, Construction site, etc.)
  • city: City of sighting
  • borough: Borough of sighting
  • latitude: Latitude of sighting
  • longitude: Longitude of sighting

Exploratory Analyses

We first explored how rat sightings vary over time (month, day of the week, year) and how rat sightings vary by borough. To do so, we used simple tables, bar charts, line plots, heat maps, and interactive maps. First, we visualize rat sightings by borough, different measures of time (year, month, and day of the week), and the different types of locations they are found at.

Rats Sightings by Bourough

by_borough <- rats_raw %>% 
  filter(!is.na(borough)) %>%
  group_by(borough) %>% 
  count() %>% 
  ggplot(aes(x = borough, y = n, fill = n)) + 
  geom_histogram(stat = "identity", position = "dodge") +
  theme(legend.position ='none',axis.title = element_text(),axis.text.x = element_text(size = 12)) +
  xlab("Borough") + 
  ylab("Count") +
  geom_text(aes(label = n), vjust = -0.1, size = 3.75) +
  ggtitle('Count of Rat Sightings by Borough') + 
  scale_fill_gradientn(name = '',colours = rev(brewer.pal(10,'Spectral'))) 

by_borough

rat_borough <-
rats_raw %>%
  group_by(borough, sighting_month, sighting_year) %>%
  mutate(rat_per_month = n()) %>%
  slice(1) %>%
  select(borough, sighting_month, sighting_year, rat_per_month)

test_borough <- broom::tidy(oneway.test(rat_per_month ~ borough, data = rat_borough))

We see that Brooklyn has the highest amount of rats, followed by Manhattan, the Bronx, Queens, and Staten Island. Running a one-way ANOVA test on the number of rat sightings (per month/year) by borough, we see that the p-value is 3.7065548^{-147} and at least one of the borough means is not equal to the others.

Rat Sightings by Year

by_year <- rats_raw %>% 
  group_by(sighting_year) %>% 
  count() %>% 
  ggplot(aes(x = sighting_year, y = n, fill = n)) + 
  geom_histogram(stat = "identity", position = "dodge") +
  theme(legend.position ='none',axis.title = element_text(),axis.text.x = element_text(size = 12)) +
  xlab("Year") + 
  ylab("Count") +
  geom_text(aes(label = n), vjust = -0.1, size = 3.75) +
  ggtitle('Count of Rat Sightings through the Years') + 
  scale_fill_gradientn(name = '',colours = rev(brewer.pal(10,'Spectral'))) 

by_year

rat_year <-
rats_raw %>%
  group_by(sighting_month, sighting_year) %>%
  mutate(rat_per_month = n()) %>%
  slice(1) %>%
  select(sighting_month, sighting_year, rat_per_month)

test_year <- broom::tidy(oneway.test(rat_per_month ~ sighting_year, data = rat_year))

We see a substantial increase in the number of rat sightings after 2020. This increase is consistent with the city of New York’s rat media coverage and the impact of the COVID-19 pandemic. With more restaurants closed and more restaurants offering outdoor dining, rats are more likely to scavenge outside. A warmer, wetter than usual summer in 2021 also contributed to favorable rat conditions. Running a one-way ANOVA test on the number of rat sightings (per month) by year, we see that the p-value is 3.9316995^{-13} so at least one of the borough means is not equal to the others.

Rat Sightings by Month

by_month <- rats_raw %>% 
  group_by(sighting_month) %>% 
  count() %>% 
  ggplot(aes(x = sighting_month, y = n, fill = n)) + 
  geom_histogram(stat = "identity", position = "dodge") +
  theme(legend.position ='none',axis.title = element_text(),axis.text.x = element_text(size = 9)) +
  xlab("Month") + 
  ylab("Count") +
  geom_text(aes(label = n), vjust = -0.1, size = 3.75) +
  ggtitle('Count of Rat Sightings by Month') + 
  scale_fill_gradientn(name = '',colours = rev(brewer.pal(10,'Spectral'))) 

by_month

rat_monthly <-
rats_raw %>%
  group_by(sighting_month, sighting_year) %>%
  mutate(rat_per_month = n()) %>%
  slice(1) %>%
  select(sighting_month, sighting_year, rat_per_month)

test_month <- broom::tidy(oneway.test(rat_per_month ~ sighting_month, data = rat_monthly))

The most rat sightings are in the summer months with a peak in July. Sightings taper off in the fall, reaching a low in December, and then start to increase in the spring. Warmer weather is more favorable to rat survival and helps their populations grow. Running a one-way ANOVA test on the number of rat sightings by month, we see that the p-value is 7.6361763^{-8} so at least one of the month means is not equal to the others.

Rat Sightings by Day of the Week

by_day <- rats_raw %>% 
  group_by(sighting_weekday) %>% 
  count() %>% 
  ggplot(aes(x = sighting_weekday, y = n, fill = n)) + 
  geom_histogram(stat = "identity", position = "dodge") +
  theme(legend.position ='none',axis.title = element_text(),axis.text.x = element_text(size = 12)) +
  xlab("Weekday") + 
  ylab("Count") +
  geom_text(aes(label = n), vjust = -0.1, size = 4) +
  ggtitle('Count of Rat Sightings by Day of Week') + 
  scale_fill_gradientn(name = '',colours = rev(brewer.pal(10,'Spectral'))) 

by_day

rat_dow <-
rats_raw %>%
  group_by(sighting_weekday, sighting_month, sighting_year) %>%
  mutate(rat_per_month = n()) %>%
  slice(1) %>%
  select(sighting_weekday, sighting_month, sighting_year, rat_per_month)

test_dow <- broom::tidy(oneway.test(rat_per_month ~ sighting_weekday, data = rat_dow))

Weekdays have the most rat sightings, peaking on Mondays and staying relatively high throughout the week, while weekends have much lower counts. Running a one-way ANOVA test on the number of rat sightings by day of the week, we see that the p-value is 7.6361763^{-8} so at least one of the day of the week means is not equal to the others.

Rat Sightings by Location Type

for_location_type <- rats_raw %>% 
  drop_na(location_type) %>%
  filter(location_type != "Other (Explain Below)") %>%
  group_by(location_type) %>%
  mutate(count_loc = n()) %>%
  ungroup() %>%
  filter(location_type %in% c("3+ Family Apt. Building", "1-2 Family Dwelling", "3+ Family Mixed Use Building", "Commercial Building", "Vacant Lot", "Construction Site"))

ggplot(data = for_location_type, aes(x = fct_infreq(location_type))) + 
  geom_bar() +
  theme_minimal() + 
  coord_flip() +
  labs(title = "Top 6 Location Types for Sightings",
       x = "Location Type",
       y = "Count")

rat_location <-
rats_raw %>%
  filter(location_type %in% c("3+ Family Apt. Building", "1-2 Family Dwelling", "3+ Family Mixed Use Building", "Commercial Building", "Vacant Lot", "Construction Site")) %>%
  group_by(location_type, sighting_month, sighting_year) %>%
  mutate(rat_per_month = n()) %>%
  slice(1) %>%
  select(location_type, sighting_month, sighting_year, rat_per_month)

test_location <- broom::tidy(oneway.test(rat_per_month ~ location_type, data = rat_location))

The above shows the top 6 location types for rat sightings. 3+ Family Apt. Buildings report the highest amount of rat sightings among all location types, while 1-2 Family Dwellings and 3+ Family Mixed Use Buildings report the next two highest amount of sightings. These location types are followed by commercial buildings, vacant lots, and construction sites. Running an ANOVA for monthly rat sightings against the top 6 location types, there is a p-value of 5.9127557^{-157} so at least one of the location type means is not equal to the others.

Interactive Maps

In order to display rat sightings across New York City, we opted to create interactive maps. The first shows all rat sightings and their geo-location while the second is a heat map.

## Overall Sightings Map and Heat Map

top = 40.917577 # north lat
left = -74.259090 # west long
right = -73.700272 # east long
bottom =  40.477399 # south lat


nyc = rats_raw %>%
  filter(latitude >= bottom) %>%
  filter ( latitude <= top) %>%
  filter( longitude >= left ) %>%
  filter(longitude <= right)

center_lon = median(nyc$longitude,na.rm = TRUE)
center_lat = median(nyc$latitude,na.rm = TRUE)

factpal = colorFactor("blue", nyc$n)

nyc %>%
  leaflet() %>%
  addProviderTiles("Esri.NatGeoWorldMap") %>%
  addHeatmap(lng = ~longitude, lat = ~latitude, intensity = ~(nyc$n), blur = 20, max = 0.05, radius = 15) %>%
  setView(lng=center_lon, lat=center_lat,zoom = 10)

Additional Analyses

We first loaded the necessary packages to start on building regression models and the cleaned rat dataset

library(tidyverse)
library(lubridate)
library(readr) 
library("ggplot2") 
library("dplyr")
library(xts)
library("lubridate")
library("RColorBrewer")
library("ggthemes")
library("gridExtra")
library("leaflet")
library("highcharter")
library(scales)
library(leaflet.extras)
library(modelr)
library(broom)

rats_raw <- read.csv("./Rat_Sightings.csv", na = c("", "NA", "N/A", "Unspecified")) %>%
  janitor::clean_names() %>% 
  mutate(created_date = mdy_hms(created_date)) %>%
  mutate(sighting_year = year(created_date),
         sighting_month_num = month(created_date),
         sighting_month = month(created_date, label = TRUE, abbr = FALSE),
         sighting_day = day(created_date),
         sighting_weekday = wday(created_date, label = TRUE, abbr = FALSE)) %>%
  mutate(season = case_when(
    sighting_month_num %in% c("12", "1", "2") ~ "Winter",
    sighting_month_num %in% c("3", "4", "5") ~ "Spring",
    sighting_month_num %in% c("6", "7", "8") ~ "Summer",
    sighting_month_num %in% c("9", "10", "11") ~ "Fall",
    TRUE ~ "Unknown"  # Catch-all for unexpected values
  )) %>%
  mutate(season = as.factor(season))

AIC Optimization

In order to assess whether location_type, sighting_year, borough and season are significant predictors of rats per month (rat_per_month), we utilized a stepwise selection algorithm. We used stepAIC command from the MASS package to perform a stepwise model selection by optimizing AIC.

rat_stepwise <-
rats_raw %>%
  filter(location_type %in% c("3+ Family Apt. Building", "1-2 Family Dwelling", "3+ Family Mixed Use Building", "Commercial Building", "Vacant Lot", "Construction Site")) %>%
  group_by(location_type, sighting_month_num, sighting_year, borough, season) %>%
  mutate(rat_per_month = n()) %>%
  slice(1) %>%
  select(location_type, sighting_month_num, sighting_year, borough, rat_per_month, season) %>% 
  mutate(sighting_month_num = as.numeric(sighting_month_num))
model_step = lm(rat_per_month ~ location_type + sighting_year + borough + season, data = rat_stepwise)
stepcount <- MASS::stepAIC(model_step, direction = "both", trace = FALSE) %>% broom::tidy()
knitr::kable(stepcount, digits = 3)
term estimate std.error statistic p.value
(Intercept) -5462.885 293.531 -18.611 0.000
location_type3+ Family Apt. Building 63.280 1.961 32.276 0.000
location_type3+ Family Mixed Use Building -35.985 2.013 -17.873 0.000
location_typeCommercial Building -37.452 1.966 -19.054 0.000
location_typeConstruction Site -49.862 2.073 -24.047 0.000
location_typeVacant Lot -45.851 1.986 -23.087 0.000
sighting_year 2.735 0.146 18.790 0.000
boroughBROOKLYN 29.594 1.816 16.298 0.000
boroughMANHATTAN 9.007 1.819 4.951 0.000
boroughQUEENS -9.482 1.828 -5.186 0.000
boroughSTATEN ISLAND -33.025 1.939 -17.036 0.000
seasonSpring 2.597 1.640 1.584 0.113
seasonSummer 11.766 1.630 7.219 0.000
seasonWinter -11.602 1.675 -6.925 0.000

We found 4 variables that we thought would be useful in predicting the number of rats per month: location_type, sighting_year, borough and season.

rat_location <-
rats_raw %>%
  filter(location_type %in% c("3+ Family Apt. Building", "1-2 Family Dwelling", "3+ Family Mixed Use Building", "Commercial Building", "Vacant Lot", "Construction Site")) %>%
  group_by(location_type, borough , sighting_year, season) %>%
  mutate(rat_per_month = n()) %>%
  slice(1) %>%
  select(location_type, borough, sighting_year, rat_per_month, season) 

We built a model that predicts rats per month (rat_per_month) from location_type, sighting_month_num, sighting_year, and season.

model <- lm(rat_per_month ~ sighting_year + borough + location_type + season, data = rat_location)
summary(model)
## 
## Call:
## lm(formula = rat_per_month ~ sighting_year + borough + location_type + 
##     season, data = rat_location)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -307.24  -49.50   -4.39   43.20  893.99 
## 
## Coefficients:
##                                             Estimate Std. Error t value
## (Intercept)                               -1.505e+04  1.393e+03 -10.806
## sighting_year                              7.541e+00  6.908e-01  10.917
## boroughBROOKLYN                            8.800e+01  8.772e+00  10.031
## boroughMANHATTAN                           2.635e+01  8.772e+00   3.004
## boroughQUEENS                             -2.701e+01  8.779e+00  -3.076
## boroughSTATEN ISLAND                      -8.832e+01  8.895e+00  -9.929
## location_type3+ Family Apt. Building       1.864e+02  9.558e+00  19.499
## location_type3+ Family Mixed Use Building -1.030e+02  9.629e+00 -10.693
## location_typeCommercial Building          -1.096e+02  9.558e+00 -11.471
## location_typeConstruction Site            -1.401e+02  9.704e+00 -14.434
## location_typeVacant Lot                   -1.344e+02  9.575e+00 -14.035
## seasonSpring                               7.561e+00  7.861e+00   0.962
## seasonSummer                               3.429e+01  7.842e+00   4.372
## seasonWinter                              -3.393e+01  7.909e+00  -4.291
##                                           Pr(>|t|)    
## (Intercept)                                < 2e-16 ***
## sighting_year                              < 2e-16 ***
## boroughBROOKLYN                            < 2e-16 ***
## boroughMANHATTAN                           0.00271 ** 
## boroughQUEENS                              0.00213 ** 
## boroughSTATEN ISLAND                       < 2e-16 ***
## location_type3+ Family Apt. Building       < 2e-16 ***
## location_type3+ Family Mixed Use Building  < 2e-16 ***
## location_typeCommercial Building           < 2e-16 ***
## location_typeConstruction Site             < 2e-16 ***
## location_typeVacant Lot                    < 2e-16 ***
## seasonSpring                               0.33627    
## seasonSummer                              1.31e-05 ***
## seasonWinter                              1.89e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 113.1 on 1640 degrees of freedom
##   (9 observations deleted due to missingness)
## Multiple R-squared:  0.5871, Adjusted R-squared:  0.5838 
## F-statistic: 179.4 on 13 and 1640 DF,  p-value: < 2.2e-16

K-Fold Cross-Validation

We then ran a k-fold cross-validation technique to create training and testing data sets and ran a new linear regression model for each of the training sets.

set.seed(23) 
rats_folds <- crossv_kfold(rat_location, k = 10)
rats_folds <- rats_folds %>% mutate(model = map(train, ~ lm(rat_per_month ~ ., data = .)))

We generated predictions using the models fitted during k-fold cross-validation.

prediction <- 
  rats_folds %>%
  mutate(predicted = map2(model, train, ~ augment(.x, newdata = .y))) %>% 
  unnest(predicted)

Residuals Plot

We created a residual plot to visually assess the performance of our model.

prediction <- prediction %>% 
  mutate(residual = .fitted - rat_per_month)

prediction%>%
  ggplot(aes(rat_per_month, residual)) +
    geom_hline(yintercept = 0) +
    geom_point() +
    stat_smooth(method = "loess") +
    theme_minimal() +
    coord_cartesian(xlim = c(0, 500))

We then produced linear regression prediction a second time, but instead to predict the number of rats per month based off the analysis that we conducted earlier in the report. Reviewing the graph of rat_per_month vs. the residual data, we see that the data points are more evenly spread out and the regression line is better fit to the data. The variables chosen were not heavily correlated with each other giving us a more accurate representation of the actual data. We found an adjusted R^2 value of 0.5838. This lower R^2 value means that less of the variability is explained by the model, but it is a more realistic depiction of our dataset.

Predictive Regression Modeling

We first clean the data set more on the predictor variables that were googled through research to have the most effect on rat sightings, keeping borough and location type.

cleaner_rats <- 
  rats_raw %>%
  drop_na(descriptor, location_type, incident_address, incident_zip,street_name, borough, latitude, longitude) %>%
  select(unique_key, agency, descriptor, location_type, incident_address, incident_zip, street_name, borough,latitude, longitude) %>%
  drop_na() %>%
  janitor::clean_names()

cleaner_rats <- as.data.frame(unclass(cleaner_rats),stringsAsFactors=TRUE)
cleaner_rats
sample_1 <- cleaner_rats[sample(nrow(cleaner_rats), 500), ]

We take out cleaned out data set from the previous data cleaning technique and we want to further clean it so that we can use some variety of variables that we need to run our linear model in the future.

model1 <-  lm(latitude ~ borough + location_type + incident_zip + street_name, data = sample_1)
model2 <-  lm(longitude ~ borough + location_type + incident_zip + street_name, data = sample_1)
summary(model1)
summary(model2)

P Value Variable Evaluation

Here we want to run a simple linear model on all of the variables to see which had the greatest effect on our outcome, the latitude and the longitude. We can quickly see that street_name, borough, location_type, incident zip played the biggest factorss based on their P values.

set.seed(23) 
cleaner_rats1 <- 
  sample_1 %>% 
  select(location_type, borough,incident_zip, street_name, latitude)

rats_folds <- crossv_kfold(cleaner_rats1, k = 10)

rats_folds <- rats_folds %>% mutate(model4 = map(train, ~ lm(latitude ~ ., data = .)))

rats_folds$model4[[1]] %>% summary()

K-Fold Cross-Validation

From here we run a simple kfold technique to have a training and testing data set. We want to split the data up as much as possible to be able to have our model run through the variations of the data set done through the K fold to be able to more accurately predict latitude and longitude.

set.seed(23) 
cleaner_rats2 <- 
  sample_1 %>% 
  select(location_type, borough, street_name, incident_zip, longitude)

rats_folds2 <- crossv_kfold(cleaner_rats2, k = 10)

rats_folds2 <- rats_folds2 %>% mutate(model = map(train, ~ lm(longitude ~ ., data = .)))

rats_folds2$model[[1]] %>% summary()
library(broom)

prediction <- 
  rats_folds2 %>%
  mutate(predicted = map2(model, train, ~ augment(.x, newdata = .y))) %>% 
  unnest(predicted)
prediction

Now we want to run our linear model onto the training data set provided above through the kfold technique and make a prediction that asks where would we expect another rat sighting to occur? Our prediction is bias and may not be as accurate as other methods due to putting factored variables into our model prediciton.

Longitude Residual Plot

We compare by looking to residual

prediction <- prediction %>% 
  mutate(residual = .fitted - longitude)

prediction%>%
  ggplot(aes(longitude, residual)) +
    geom_hline(yintercept = 0) +
    geom_point() +
    stat_smooth(method = "loess") +
    theme_minimal()

R^2 of Longitude Residuals

By looking at the residual values given through our predicition it is clear that we see and can predict another rat sighting to occur around -74.0 - -73.9 degrees longitude.

rs <- prediction %>%
  group_by(.id) %>% 
  summarise(
    sst = sum((longitude - mean(longitude)) ^ 2), # Sum of Squares Total
    sse = sum(residual ^ 2),          # Sum of Squares Residual/Error
    r.squared = 1 - sse / sst         # Proportion of variance accounted for
    )

rs %>% 
  ggplot(aes(r.squared, fill  = .id)) +
    geom_histogram() +
    geom_vline(aes(xintercept = mean(r.squared)))

Once again we want to see how accurate our model is so we run R^2 value and notice they are all around 0.9915 standard dependent on the random samples chosen from the data set. A note about running this model is that we do not have an accurate adjusted R^2 value provided due to most likely overfitting of the data set.

Now we repeat the process on the Latitude data

library(broom)

prediction2 <- 
  rats_folds %>%
  mutate(predicted = map2(model4, train, ~ augment(.x, newdata = .y))) %>% 
  unnest(predicted)
prediction

Residual Plot of Latitude

prediction2 <- prediction2 %>% 
  mutate(residual = .fitted - latitude)

prediction2%>%
  ggplot(aes(latitude, residual)) +
    geom_hline(yintercept = 0) +
    geom_point() +
    stat_smooth(method = "loess") +
    theme_minimal()

Looking at the residual we can predict that the rat might appear again around 40.7 degrees of latitude.

R^2 of Latitude Residuals

rs2 <- prediction2 %>%
  group_by(.id) %>% 
  summarise(
    sst = sum((latitude - mean(latitude)) ^ 2), # Sum of Squares Total
    sse = sum(residual ^ 2),          # Sum of Squares Residual/Error
    r.squared = 1 - sse / sst         # Proportion of variance accounted for
    )

rs2 %>% 
  ggplot(aes(r.squared, fill  = .id)) +
    geom_histogram() +
    geom_vline(aes(xintercept = mean(r.squared)))

We notice that our R^2 value is higher with the latitude plot being around 0.9972 again needing to be adjusted based on overfitting all dependent on the random samples chosen from the data set.

Overall this data does make sense as it lands us in New York City (Manhattan) when we plot the Expected values onto a latitude and longitude map.



Discussion


-When analyzing our data, we considered many different factors that could be contributors to varying rat sightings in NYC.

-We believe that the exponential increase in rat sightings at the end of 2020 and going into 2021 could be a result of the pandemic. Since most restaurants and establishments were closed and less people filled the streets during the lock down, the rats were able to scavenge outside without interruptions. They were most likely able to find food, shelter, and reproduce without people and cars scaring them away. This could have led to an increase in the rat population and more rat sightings. The pandemic also led to an increase in outdoor dining, and these dining areas are still being used today. This could explain the continued high number of rat sightings since there is more food scraps and crumbs outside for the rats to find.

-More rat sightings are also observed during the summer months which can be explained by warmer and wetter weather than the rest of the year. Since the weather is warm, the rats are able to scurry around the streets comfortably instead of having to shelter for warmth and safety. Due to temperature, rats normally have their babies in the spring months which also explains an increase in rat sightings during the summer.

-We also found that rat sightings are highest during the weekdays (especially towards the beginning) and lowest during the weekend. This can be attributed to people leaving their houses more on the weekdays to go to work, errands, and other obligations, compared to staying in their houses on the weekends.

  • When analyzing rat sightings by location type, we see that rats are seen most in 3+ family apt. buildings. This may be because these types of apartments are usually large buildings that are maintained with comfortable living conditions such as heat/air conditioning, clean environment, kitchens full of food, etc. That sounds like a perfect place for a rat to settle down. The number of rat sightings are followed by 1-2 family dwellings and 3+ family mixed use buildings which can be explained by the same argument as above. The rats are spotted less often in commercial buildings, vacant lots, and constructions sights. This is most likely because these types of locations are not conducive for a rat to survive. There is most likely little to no food available for them and harsh/uncomfortable living conditions. They especially would not want to nest around constructions sites since it is dangerous and sterile.

Our heat map of rat sightings shows that rats have pretty much taken over the entire city. However, as you zoom in, there are more dense pockets of rat sightings which could be contributed to restaurants or apartment buildings. There does not seem to be one borough that has significantly less rats than the rest.